23 research outputs found
ActiveNeRF: Learning where to See with Uncertainty Estimation
Recently, Neural Radiance Fields (NeRF) has shown promising performances on
reconstructing 3D scenes and synthesizing novel views from a sparse set of 2D
images. Albeit effective, the performance of NeRF is highly influenced by the
quality of training samples. With limited posed images from the scene, NeRF
fails to generalize well to novel views and may collapse to trivial solutions
in unobserved regions. This makes NeRF impractical under resource-constrained
scenarios. In this paper, we present a novel learning framework, ActiveNeRF,
aiming to model a 3D scene with a constrained input budget. Specifically, we
first incorporate uncertainty estimation into a NeRF model, which ensures
robustness under few observations and provides an interpretation of how NeRF
understands the scene. On this basis, we propose to supplement the existing
training set with newly captured samples based on an active learning scheme. By
evaluating the reduction of uncertainty given new inputs, we select the samples
that bring the most information gain. In this way, the quality of novel view
synthesis can be improved with minimal additional resources. Extensive
experiments validate the performance of our model on both realistic and
synthetic scenes, especially with scarcer training data. Code will be released
at \url{https://github.com/LeapLabTHU/ActiveNeRF}.Comment: Accepted by ECCV202
Anytime Stereo Image Depth Estimation on Mobile Devices
Many applications of stereo depth estimation in robotics require the
generation of accurate disparity maps in real time under significant
computational constraints. Current state-of-the-art algorithms force a choice
between either generating accurate mappings at a slow pace, or quickly
generating inaccurate ones, and additionally these methods typically require
far too many parameters to be usable on power- or memory-constrained devices.
Motivated by these shortcomings, we propose a novel approach for disparity
prediction in the anytime setting. In contrast to prior work, our end-to-end
learned approach can trade off computation and accuracy at inference time.
Depth estimation is performed in stages, during which the model can be queried
at any time to output its current best estimate. Our final model can process
1242375 resolution images within a range of 10-35 FPS on an NVIDIA
Jetson TX2 module with only marginal increases in error -- using two orders of
magnitude fewer parameters than the most competitive baseline. The source code
is available at https://github.com/mileyan/AnyNet .Comment: Accepted by ICRA201
Learning to Weight Samples for Dynamic Early-exiting Networks
Early exiting is an effective paradigm for improving the inference efficiency
of deep networks. By constructing classifiers with varying resource demands
(the exits), such networks allow easy samples to be output at early exits,
removing the need for executing deeper layers. While existing works mainly
focus on the architectural design of multi-exit networks, the training
strategies for such models are largely left unexplored. The current
state-of-the-art models treat all samples the same during training. However,
the early-exiting behavior during testing has been ignored, leading to a gap
between training and testing. In this paper, we propose to bridge this gap by
sample weighting. Intuitively, easy samples, which generally exit early in the
network during inference, should contribute more to training early classifiers.
The training of hard samples (mostly exit from deeper layers), however, should
be emphasized by the late classifiers. Our work proposes to adopt a weight
prediction network to weight the loss of different training samples at each
exit. This weight prediction network and the backbone model are jointly
optimized under a meta-learning framework with a novel optimization objective.
By bringing the adaptive behavior during inference into the training phase, we
show that the proposed weighting mechanism consistently improves the trade-off
between classification accuracy and inference efficiency. Code is available at
https://github.com/LeapLabTHU/L2W-DEN.Comment: ECCV 202
Large expert-curated database for benchmarking document similarity detection in biomedical literature search
Document recommendation systems for locating relevant literature have mostly relied on methods developed a decade ago. This is largely due to the lack of a large offline gold-standard benchmark of relevant documents that cover a variety of research fields such that newly developed literature search techniques can be compared, improved and translated into practice. To overcome this bottleneck, we have established the RElevant LIterature SearcH consortium consisting of more than 1500 scientists from 84 countries, who have collectively annotated the relevance of over 180 000 PubMed-listed articles with regard to their respective seed (input) article/s. The majority of annotations were contributed by highly experienced, original authors of the seed articles. The collected data cover 76% of all unique PubMed Medical Subject Headings descriptors. No systematic biases were observed across different experience levels, research fields or time spent on annotations. More importantly, annotations of the same document pairs contributed by different scientists were highly concordant. We further show that the three representative baseline methods used to generate recommended articles for evaluation (Okapi Best Matching 25, Term Frequency-Inverse Document Frequency and PubMed Related Articles) had similar overall performances. Additionally, we found that these methods each tend to produce distinct collections of recommended articles, suggesting that a hybrid method may be required to completely capture all relevant articles. The established database server located at https://relishdb.ict.griffith.edu.au is freely available for the downloading of annotation data and the blind testing of new methods. We expect that this benchmark will be useful for stimulating the development of new powerful techniques for title and title/abstract-based search engines for relevant articles in biomedical research.Peer reviewe
Extreme Masking for Learning Instance and Distributed Visual Representations
The paper presents a scalable approach for learning distributed
representations over individual tokens and a holistic instance representation
simultaneously. We use self-attention blocks to represent distributed tokens,
followed by cross-attention blocks to aggregate the holistic instance. The core
of the approach is the use of extremely large token masking (75%-90%) as the
data augmentation for supervision. Our model, named ExtreMA, follows the plain
BYOL approach where the instance representation from the unmasked subset is
trained to predict that from the intact input. Learning requires the model to
capture informative variations in an instance, instead of encouraging
invariances. The paper makes three contributions: 1) Random masking is a strong
and computationally efficient data augmentation for learning generalizable
attention representations. 2) With multiple sampling per instance, extreme
masking greatly speeds up learning and hungers for more data. 3) Distributed
representations can be learned from the instance supervision alone, unlike
per-token supervisions in masked modeling.Comment: Technical Repor